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Abstract 

The native folding of certain zymogen-derived enzymes is completely dependent upon a prosegment domain to stabilize 
the folding transition state, thereby catalyzing the folding reaction. Generally little is known about how the prosegment 
accomplishes this task. It was previously shown that the prosegment catalyzes a late-stage folding transition between a 
stable misfolded state and the native state of pepsin. In this study, the contributions of specific prosegment residues to 
catalyzing pepsin folding were investigated by introducing individual Ala substitutions and measuring the effects on the 
bimolecular folding reaction between the prosegment peptide and pepsin. The effects of mutations on the free energies of 
the individual misfolded and native ground states and the transition state were compared using measurements of 
prosegment-pepsin binding and folding kinetics. Five out of the seven prosegment residues examined yielded relatively 
large kinetic effects and minimal ground state perturbations upon mutation, findings which indicate that these residues 
form strengthened and/or non-native contacts in the transition state. These five residues are semi- to strictly conserved, 
while only a non-conserved residue had no kinetic effect. One conserved residue was shown to form native structure in the 
transition state. These results indicated that the prosegment, which is only 44 residues long, has evolved a high density of 
contacts that preferentially stabilize the folding transition state over the ground states. It is postulated that the prosegment 
forms extensive non-native contacts during the process of catalyzing correct inter- and intra-domain contacts during the 
final stages of folding. These results have implications for understanding the folding of multi-domain proteins and for the 
evolution of prosegment-catalyzed folding. 
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Introduction 

General thermodynamic and kinetic features of protein folding 
are known, such as that the natively folded conformation of a 
protein is thermodynamically stabilized [1] and that there is a 
relationship between topology and folding rate, which holds for 
both two-state and multi-state folding proteins [2,3]. However, 
zymogen-derived proteins which require a prosegment (PS) 
domain to catalyze folding, such as the serine peptidases ocLP 
[4], SGPB [5], and subtilisin [6], and the aspartic peptidase pepsin 
[7], deviate from the common thermodynamic and kinetic trends 
in protein folding to varying degrees. For example, as shown in 
Fig 1A, the native states of aLP, pepsin and SGPB are 
thermodynamically unstable (AGn-i = — 4 kcal/mol) [8], metasta- 
ble (AGn-i= — 0.1 kcal/mol) [7] and marginally stable (AGn_t = + 
1 kcal/mol) [9], relative to an intermediate state, respectively. 
Additionally, in the absence of the PS domain, these proteins fold 
much slower than would be estimated based on their topology 
(Fig IB) 

When the PS is included these deviations are corrected: the PS 
shifts the folding equilibrium towards the PS-native state complex 



and catalyzes folding by stabilizing the folding transition state (TS) 
(bold, blue line in Fig 1A). Once folding is complete, the PS is 
removed and the folding and unfolding activation barriers 
increase, leaving behind a kinetically trapped native state (black 
line in Fig 1A). There is an intriguing separation of low- and high- 
barrier folding landscapes, with and without the PS, respectively, 
and understanding how a PS domain stabilizes the folding TS 
should be informative for understanding kinetic folding/ unfolding 
barriers in general, which remain poorly understood [10]. Despite 
several studies of PS-catalyzed folding [1 1-14], the mechanism by 
which a PS stabilizes the TS remains unknown. 

Pepsin, which is derived from its zymogen pepsinogen, folds to a 
thermodynamically stable yet non-native form (termed refolded 
pepsin, Rp) upon removal from denaturing conditions [15]. Rp is 
inactive, contains native-like secondary and tertiary structure and 
has a greater thermal stability (A7 m = +5°C) [15] and reduced 
picosecond diffusive motions when compared to native pepsin (Np) 
[16]. These features suggest that Rp is a late-stage folding 
intermediate, which in turn indicates that the PS operates late in 
the pepsinogen folding pathway. 
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Figure 1 . Zymogen-derived proteins deviate from common trends in protein folding. (A) Comparison of the non- and PS-catalyzed folding 
of otLP [8], SGPB [5] and pepsin [7]. (B) Relation between topology and folding rate for a number of two- and three-state folding proteins (circles, data 
taken from [3,21 ,22]). The folding rate of aLP (squares), SGPB (triangles) and pepsin (stars) is accelerated to the value (hollow points) expected based 
on the topology, only when the PS is included. (C) Reaction scheme of pepsin PS-catalyzed folding. The PS binds Rp and catalyzes its conversion to 
Np at pH 5.3, where the PS is a strong inhibitor of Np. The PS dissociates from Np at pH<3. 
doi:1 0.1 371 /journal.pone.01 01 339.g001 



The folding of Rp to Np serves as a useful model for examining 
late-stage folding transitions between compact-misfolded and 
native states, and under native conditions (i.e., pH 5.3 and no 
denaturants), which are often difficult to access experimentally. 
Indeed, the study of such misfolding events is often only possible at 
the single-molecule level [17]. Given the increased risk of 
misfolding in multi-domain proteins, via incorrect domain-domain 
contacts [18,19], it is prudent to examine the various mechanisms 
by which proteins have evolved to avoid such issues. 

The present study undertook to examine the energy landscape 
of PS-catalyzed pepsin folding by measuring the contributions of 
specific PS residues to stabilizing the folding TS and ground states 
(Rp and Np). Our method took advantage of the bimolecular PS- 
catalyzed folding reaction of pepsin (Fig 1C). Synthetic PS peptide 
was added exogenously to Rp and Np and the changes in 
equilibrium stability of PS-Np relative to PS-Rp upon mutation, 
AAGps(jf-p-Rp), were determined from the difference in the changes 
in binding energies, as shown in Fig SI. Changes in activation 
energy upon mutation, were determined by measuring 

changes in the rate of PS-catalyzed folding. 

A simple comparison of AAG* and AAGW^j^i yields the so- 
called fl>-value [20]. CD-values are calculated as the ratio of the 
change in activation energy to the change in equilibrium stability 
_d) upon introducing a point mutation. Under 
the classical interpretation, a residue can belong to a region of 
either unfolded or native structure in the TS, giving rise to the 
limiting C>-values of 0 or 1 [23]; however, a range of fractional fl>- 
values are more commonly observed [24,25]. <t>-value analysis is 



often applied to studies of unimolecular folding [26-31], in which 
all energies are measured relative to one state, generally the 
unfolded state (and thus mutation effects on this state go 
unresolved [32]). Compared to fl>-value analysis of unimolecular 
folding, the bimolecular approach allows a comparison of the 
effects of point mutants on each individual ground state and the 
TS. The AAG values obtained for each individual state are more 
informative than the O-values derived from them, yet fl>-values are 
also reported here as a point of comparison. 

Application of fl>-value analysis to the PS-catalyzed folding of 
pepsin yielded predominantly abnormal fl>-values ( < 1 ) >1, *K0), 
reflecting the finding that most of the mutations resulted in a 
greater destabilization of the PS-TS than of either PS-Rp or PS- 
Np. This greater sensitivity to perturbation of the PS-TS complex 
likely indicates either the presence of strong non-native interac- 
tions or reduced conformational strain in the TS, or a combination 
of both factors. 

Results 

Selection of point mutations 

PS mutations were chosen on the basis of examining sequence 
conservation and the available zymogen crystal structures of 
porcine pepsinogen and human progastricsin (Fig S2). As proteins 
are believed to have evolved stable, mutually supportive native 
contacts to avoid degeneracy on the folding landscape (i.e., 
minimally frustrated contacts) [33], highly conserved residues may 
be particularly important for proper folding. A conserved domain 
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search within the Pfam database [34] revealed that PS residues 1— 
29 of pepsinogen correspond to the Al propeptide conserved 
motif, which is found in 402 sequences from 107 animal species. 
Pepsinogen homologues were identified using a sequence similarity 
search and these were further aligned, focusing on the 44-residue 
PS domain, revealing that PS residues L6 and R13 are strictly 
conserved (Fig S3). Only the R13 and K36 side-chains of the PS 
form hydrogen bonds with the mature pepsin domain, with 
hydrogen bonds formed between R13 and pepsin Dll and 
between K36 and the catalytic residues of pepsin, D32 and D215. 
Seven residues were chosen for mutation, shown in Fig 2A: the 
stricdy conserved L6 and R13, the semi-conserved V4, SI 1, F25 
and K36, and the non-conserved 117. F25 was chosen primarily to 
probe the effect of mutating the second a-helical segment, which 
makes no direct contacts with the pepsin domain, such that the N- 
terminal fi-strand and all three ot-helical segments of the PS were 
probed with mutations. 

The residues were replaced with Ala in the corresponding seven 
synthetic peptides. Ala substitutions were shown to be generally 
conservative mutations that do not introduce non-native interac- 
tions, which would further complicate data analysis, while at the 
same time provide a measurable destabilization [32]. 



Effects of PS mutants on binding and folding catalysis 

The PS-catalyzed folding and binding data are summarized in 
Fig's 2 and S4 and Table 1. All of the mutants markedly slowed 
the rate of PS-catalyzed folding except for 1 1 7 A, which had no 
effect. The L6A mutant gave the slowest folding rate, while the 
other mutants resulted in similar rates. The narrow range of effects 
that mutations had on the folding rate suggests that the PS 
stabilizes the folding TS via contacts made along an extensive 
portion of the PS and not from localized contacts. For some of the 
mutants, e.g. SUA, a burst-phase in the folding kinetics was 
noticeable (Fig S4), although the basis for this feature is not yet 
clear. Generally the PS mutants had a small impact on PS affinity 
for Rp. One exception was R13A, which reduced the binding 
affinity 7-fold, while the effects of the other mutants were modest 
(2 to 3-fold reduction in affinity) or had no effect at all (I17A and 
F25A were similar to PS^). The mutations resulted in a wider 
distribution of affinities for Np, measured by PS inhibition of Np. 
In this case, V4A was similar to PS wt while R13A and I17A 
resulted in the largest reduction in affinities for Np. 




Figure 2. Effects of PS point mutants on binding and catalyzing pepsin folding. (A) Structure of pepsinogen (PD8 code: 3PSG) with the PS 
(pink) located between the N- and C-terminal lobes, forming part of a six-stranded (3-sheet, and K36 of the PS interacts with the catalytic residues, D32 
and D215 (red). PS residues selected for mutation to Ala are shown in space-filling form and coloured according to type (grey-hydrophobic, orange- 
polar, blue-basic, red-acidic). (B) Comparison of wild-type and mutant PS-catalyzed folding of pepsin. The rate of PS-catalyzed folding (kf) was 
determined by adding PS to Rp, at pH 5.3, 15°C (see Text S2: folding rate followed Arrhenius temp-dependence from 0 — 15"C, shown in Fig S5), 
and measuring the formation of Np based on enzyme activity measured at pH 1.2, 25°C. The data were fit according to a monoexponential function 
to obtain kf. (C) Comparison of wild-type and mutant PS affinity for Rp. PS-Rp binding was determined by following the increase in Trp-fluorescence 
of pepsin as a function of [PS], The data were fit according to eq 1 to determine the dissociation constant, K d , at 20°C, pH 5.3. (d) Comparison of wild- 
type and mutant PS affinity for Np. The reduction in Np activity was measured as a function of [PS], The data were fit according to a competitive 
inhibitor model, eq 2, to determine the inhibition (dissociation) constant, K„ at 20°C, pH 5.3. All data are reported as the average ± SD of 3-5 
measurements for each PS peptide. 
doi:10.1371/journal.pone.0101339.g002 
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Changes in the PS-catalyzed folding energy landscape 

The changes in PS-Rp and PS-Np binding energy upon 
mutation (AAGps-jtf and AAGps-jfp) were obtained from measure- 
ments of and A;. The change in PS-TS binding energy (AAGpg- 
ts) was taken as the change in the folding activation energy, AAG*, 
added to AAGp S R p, as AG* was determined relative to PS-Rp. The 
changes in binding and folding activation energies are given in 
Table 1. The effects of each PS mutation on the PS-catalyzed 
folding landscape are readily compared by plotting the changes in 
binding energies of the denatured (PS-Rp), transition (PS-TS) and 
native (PS-Np) states, as shown in Fig 3A. 

PS-Rp was destabilized by mutations at V4, L6, Sll and R13, 
which corresponds to the N-terminal |3-strand and a-helix- 1 of the 
PS within pepsinogen (Fig 2A), indicating that this region may 
play a dominant role in defining the initial PS-Rp complex. 
Conversely, both I17A and F25A had a negligible effect on PS-Rp 
binding, indicating that the PS is likely unstructured in this region 
within PS-Rp. All of the mutations, except V4A, R13A, and 
K36A, were more destabilizing to PS-Np than to PS-Rp. V4A had 
a very small stabilizing effect on PS-Np and a relatively large 
destabilizing effect on PS-Rp. Conversely, R13A destabilized both 
the PS-Np and PS-Rp complexes - the similar magnitudes of 
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Figure 3. Changes in the PS-catalyzed folding energy land- 
scape upon mutation of the PS peptide. (A) The changes in energy 
of each conformation were determined as changes in binding energies. 
(B) 3>-values calculated from the ratio of the changes in activation 
energy (AAG*) and free energy difference between PS-Np and PS-Rp 
(AAGpsfup.Rp)). Error bars show ± SD derived by propagation of errors. 
doi:1 0.1 371/journal.pone.01 01 339.g003 
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AAGps-Rp and AAGps-jq, suggest that R13 had similar contacts in 
both PS-Rp and PS-Np. K36A was also nearly equally destabi- 
lizing to PS-Rp and PS-Np, suggesting a similar structure in both 
complexes, although overall K36A had less of an impact on 
binding to Rp and Np than R13A. 

With the exception of I17A, the mutations were most 
destabilizing to the PS-TS complex, particularly the mutation of 
the strictly conserved R13. I17A had a negligible effect on PS-Rp 
and PS-TS stability, yet was one of the most destabilizing 
mutations to the PS-Np complex, indicating that this non- 
conserved residue makes no contribution to catalyzing folding 
but contributes to driving the equilibrium towards PS-Np. 

O-values were obtained, using the data in Fig 3A, by 
subtracting AAG K - R p from both AAGps-rs (to obtain AAG*) and 
AAGps-jvp (to obtain AAGpspifp-gpj), and then dividing the first result 
by the second (Fig 3B). A O-value close to either 0 or 1 would 
indicate that a PS residue adopts a conformation identical to that 
in either PS-Rp or PS-Np in the PS-TS complex, respectively. As 
seen in Table 1 and Fig 3B, most of the mutants gave rise to 
large positive or negative O-values due to the relatively small 
AAGp S(JVp _ Xp j and large AAG* values. L6A, R13A and K36A, in 
particular, yielded exceptionally large O-values (as are the 
associated errors) owing to AAGp S (Mp-Rp) close to zero while 
AAG % is ~ 1 kcal/ mol. As discussed below, these values likely 
reflect the formation of non-native interactions and/or reduced 
conformational strain in the TS. 

Discussion 

PS stabilizes TS independently of ground states 

For small, two-state folding proteins, a predominance of low 
fractional O-values is interpreted as a diffuse TS with weakened 
native-like structure, while CD-values clustering towards low and 
high fractional values is interpreted as a polarized TS structure, 
with some regions forming native-like contacts and others being 
unfolded [35]. The O-values presented for PS-pepsin show a very 
different trend: instead of fractional values, five of the seven 
residues were characterized by large positive or negative O-values 
indicating a highly structured TS state with strengthened 
interactions. A useful means by which to compare the kinetic 
effect of various mutations is to use a Bronsted plot [35], as shown 
in Fig 4. It can be seen that the increase in folding activation 
energy occurs independently of the ground state perturbation, 
indicating that all of the PS residues examined, except for 117, play 
a common role in defining the folding barrier. The R13A and 
K36A mutations gave particularly large kinetic effects, with 
AAGp S {Mp-Rpj close to 0 and AAG* of ~1 kcal/mol. As strong 
kinetic effects were seldom observed out of a comparison of 
hundreds of mutations from various small, single domain proteins 
[24,35], this data supports the idea that the PS plays a unique role 
in stabilizing the folding TS. 

Physical basis for large kinetic effects and abnormal 
O-values 

The perturbations introduced by each mutation are character- 
ized in most detail by the individual binding energy changes, 
AAG PS _ R p, AAGp S _jyp and AAGp S _ TS (Fig 3A). Ala-scanning allowed 
for the identification of key residues that provide extra PS-TS 
stabilization, yet the nature of these interactions remains open to 
speculation. A consequence of the larger and opposite effects on 
the TS compared to the ground states (Fig 4) is that this gives rise 
to O-values that fall outside the typical range of 0 to 1 . When 
interpreting the large kinetic effects observed in PS-catalyzed 



1.6- 

1.4- 

;^1.2- 

| 1.0- 

TO 0.8- 
o 

^0.6- 

< 0 - 4 " 
< 0.2- 



0.0- 
-0.2- 



L6A 



V4A 



+ 




K36A 



-1.2 -0.8 



— I — 

-0.4 

AAG, 



11 7A 



0.0 



— I — 

0.4 



— I — 

0.8 



1.2 



1.6 



PS(Np-Rp) 



(kcal mol -1 ) 



Figure 4. Bronsted plot. A comparison of the mutation effects on the 
folding activation energy as a function of the change in equilibrium 
stability. Dashed lines indicate the trend lines for AAG values that 
would give rise to 3>-values of 0 or 1 and error bars show ± SD. 
doi:1 0.1 371/journal.pone.01 01 339.g004 

pepsin folding, it is worth considering previous reports of 
abnormal O-values determined for unimolecular folding. 

Although O-values outside the range 0 to 1 account for as much 
as 10 to 20% of those reported [23], it was argued that many of 
these unusual O-values are not reliable as they are associated with 
small AAGn.d and AAG values [35]. In practice, what is 
considered the lower limit of AAG^.n from which reliable O- 
values may be calculated differs among reports, ranging from 1.7 
[35] to 0.6 [36] to 0.2 kcal/mol [30]. Fortunately, analysis of 
pepsin PS-catalyzed folding did not rely on direct measurements of 
the generally small values of AAGpsgvp-Rp), as the individual 
binding energies provide specific detail on the perturbations 
introduced upon mutation to each of the PS-Rp and PS-Np 
complexes (Fig 3A). Even in the cases where AAGps(jvp-Rp) is close 
to 0 {e.g., L6A, R13A and K36A), the AAG* values are substantial 
(~ 1 kcal/ mol) such that the associated O-values can be reliably 
classified as 'abnormal' (outside the range 0 to 1), even if they 
cannot be measured quantitatively owing to the correspondingly 
large errors. For example, R13A yields a O-value of — 21 ±67 that 
is unreliable statistically, yet the underlying comparison that the O- 
value represents, that AAG* = 1.02±0.15> AAG FS (jvp-Rpj =~ 
0.05 ±0.1 5 (all units in kcal/mol), is reliable. 

It was previously observed that abnormal O-values occur more 
frequently for mutations that could lead to changes in stability or 
configurational dynamics of the denatured state, such as mutation 
of charged and polar residues or of Ala and Gly [37], As the pepsin 
PS is unstructured on its own, as verified by CD (Fig S6), 
mutations would have minimal effects on the structure and 
stability of the PS, and the measured changes in binding energies 
can be ascribed entirely to changes in PS-pepsin interactions. 

An analysis of 806 mutants from 24 proteins indicated that O- 
values are strongly influenced by packing density and local 
interactions [24]. Residues at the surface tend to make fewer and 
more localized contacts than internal residues and thus can adopt 
a native-like structure when only a few local contacts are formed. 
Furthermore, the AAGn-d values for mutations at locations with 
few contacts are generally smaller than those for buried residues 
with many native contacts. Thus, mutations of surface residues 
result in both smaller AAGn-d and larger O-values, while 
mutations at core residues tend to give larger AAGn-d and smaller 
O-values [24,37]. The PS residues examined in the present study 
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are mostly buried from the solvent yet are not core residues, as the 
PS sits at the surface (Fig 3A). Fewer native contacts made by PS 
residues could explain the relatively small AAGps(jy,-itp) values, yet 
it does not explain the larger effects on AAG*. 

<l>-values outside the classical range of 0 to 1 are the result of 
opposite or larger energetic effects in the TS than in the native 
state, although the microscopic basis for this is not certain. One 
hypothesis is that these unusual O-values arise from alternative 
flow channels down the folding funnel [38], with alternative 
folding paths (and thus a different TS to be crossed) becoming 
more predominant upon introduction of a point mutation. While 
the notion of alternative flow channels is consistent with the view 
of a funneled folding landscape, the only supporting evidence to 
date has come from native-centric lattice Go models [38]. An 
alternative interpretation is that unusual O-values can arise when 
side-chains form non-native contacts in the TS [39,40]. Both 
experiments and simulations have shown that non-native interac- 
tions can accelerate or decelerate folding [39-43]. Suspected non- 
native interactions were found to involve both hydrophobic and 
electrostatic interactions, and these can stabilize or destabilize the 
TS [26,40,44]. In addition to non-native interactions, unusual O- 
values may arise upon mutation of a group that experiences 
different conformational strain in the TS and native state. 
Mutations that change the size of hydrophobic side-chains can 
stabilize the TS (by optimizing side-group packing) while 
destabilizing the native state, due to the different compactness of 
the TS, thereby producing negative O-values [30]. Similarly, 
mutations that stabilize the native state yet destabilize the TS due 
to differences in conformational strain also produce negative O- 
values [45]. Conformational strain present in the TS but not in the 
native state can also give rise to ©-values >1 [46]. In PS-pepsin 
the V4A mutation was stabilizing to the native state and 
destabilizing to the TS, suggestive of a slighdy frustrated or 
overly-packed native state and a more optimally packed TS. In 
fact, all the anomalous <I>-values for PS-pepsin (both negative and 
positive) resulted from AAG + > KAGp S (Mp-Rp>- 

The latter result is consistent with the finding that the PS has 
picomolar affinity for the TS compared to nano- and micromolar 
affinity for Np and Rp, respectively [7] . It seems likely that the PS 
has achieved this higher affinity for the TS via a concerted 
optimization of side-chain packing, hydrogen bonding and 
electrostatics, as evidenced by polar, hydrophobic and charged 
groups selectively stabilizing the TS (Fig 3A). Furthermore, this 
concerted optimization likely involves both non-native and 
strengthened native contacts, given the range of interactions 
involved. In the native fold [47], PS residues R8, R13 and K36 
form ion pairs with pepsin residues E13, Dll and D32, D215, 
respectively, and these may be optimized in PS-TS. SI 1 is flanked 
on either side by R8 and R13 and thus may influence the strength 
of these interactions. V4 and L6 are likely optimally packed in the 
TS compared to in PS-Rp and PS-Np, in line with previous 
evidence that non-native packing accelerates folding [39-43]. 
Further insight into the nature of these contacts may be gleaned 
from future studies involving double-mutant cycles (e.g., to study 
the influence of ion-pairs) and the systematic reduction of side- 
chain size (e.g., Val — * Ala — * Gly), and both approaches have been 
used previously [20,26,27,40]. 

Coarse-grain simulations have been used to understand the 
nature of non-native interactions in small, single-domain proteins 
lacking stable folding intermediates [40,48,49]. To our knowledge, 
such approaches have not yet been applied to larger proteins, 
although the folding of Dehl, a 3 1 1 -residue protein with a knotted- 
fold, was simulated using a native-centric Go model that did not 
include non-native contacts [50]. It would be challenging, but not 



impossible, to use similar coarse-grain approaches to model the 
folding of pepsinogen (370 residues). Using simulation to gain 
insight into the nature of the non-native contacts formed within 
PS-TS would be greatly facilitated by knowing the high-resolution 
structure of PS-Rp. This would require characterizing the PS-Rp 
complex before it folded to PS-Np. The data presented here 
(Fig 3A) indicate that PS-Rp could be 'trapped' for further 
structural analysis by using a PS with a double mutation (such as 
PSh7a/F25a), which would be expected to shift the folding 
equilibrium from PS n 7A/F25A-Np to PS n 7A/F25A-Rp- This hy- 
pothesis was confirmed recendy using 'H- 15 N TROSY NMR to 
show that PSn7A/F25A-Rp is structurally very similar to Rp alone 
[51]. 

Only I17A (O = 0) and F25A (0= 1.1) gave typical O-values, 
indicating that 117 adopts native-like structure after formation of 
the TS, while F25 adopts native-like structure during formation of 
the TS. F25 is located on the second a-helix of the PS, which runs 
across the top of the active site cleft in pepsinogen (Fig's 2A and 
S2), the C-terminus of which (residue 29) marks the end of the 
conserved Al propeptide motif [34] . Given that F25A yielded a O- 
value close to 1 indicates that this ot-helical segment may be 
structured in the TS, suggesting its importance to PS-catalyzed 
folding. 

PS stabilizes a late-stage transition between compact 
misfolded and native states 

For pepsin [7,15], aLP4 [8], and SGPB [5], the PS catalyzes 
folding from a compact, well-structured denatured state, indicating 
that the PS acts at a late stage in the folding process. In the case of 
pepsin, Rp was characterized by a AG unf of 5.8 kcal/mol, a 10% 
increase in unordered secondary structure and identical tertiary 
structure to Np (both yield R g ~20 A) [15]. Similarly, the stable 
denatured state of ocLP was found to have secondary and tertiary 
structures intermediate between the native and unfolded forms, 
with a 9% increase in unordered secondary structure and a 40% 
increased hydrodynamic radius [4]. The intermediate aLP gave a 
AG un f of 1 kcal/ mol, which was 4 kcal/ mol more stable than the 
native state [4]. The SGPB intermediate was characterized by a 
AG un f of 0.5 kcal/mol but was less stable than the native state by 
0.8 kcal/mol [5]. 

Given the unique properties of Rp, a thermodynamically stable 
[7], rigid [16], native-like yet inactive form, it is reasonable to 
suppose that Rp is a late-stage misfolded state that lacks the correct 
domain-domain interactions found in Np. The active site cleft of 
Np is formed between the N- and C-terminal lobes (Fig 2A), and it 
is possible that the PS catalyzes the correct formation of inter- 
domain contacts. This scenario is consistent with the relatively 
small changes in secondary and tertiary structure that accompany 
the Rp to Np transition [15]. This picture is also consistent with 
the concept of independent foldon units (in this case the N- and C- 
terminal domains), which fold independendy, followed by a rate- 
limiting docking step [52] . In the case of pepsin, and perhaps other 
zymogen-derived proteins, the rate-limiting docking of domains is 
slow enough that a PS is required to act as a foldase. 

With the discovery of PS-assisted folding in a few evolutionarily 
unrelated serine peptidases, it was suggested that PS-catalyzed 
folding has developed through convergent evolution [11,13], a 
notion supported by the similarities that exist in the folding 
mechanisms of pepsin and aLP. Interestingly, the PS of 
pepsinogen is 44 residues in length whereas the PS domains for 
aLP and SGPB are much longer, at 166 and 76 residues, 
respectively. The PS-catalyzed folding of pepsin yields a folding 
rate enhancement (k CM / £„on-cat) f ar greater than that of SGPB, yet 
is less than that of aLP (Fig 1AB). Thus, PS length is not 
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necessarily correlated with the power of the PS as a folding 
catalyst. This comparison also highlights that the pepsin PS is a 
highly efficient folding catalyst: at a mere 44 residues in length it 
provides substantial TS stabilization per residue (Fig 3A), likely via 
a high density of strengthened native and/ or non-native contacts. 

Mutations to the PS of aLP were also found to have a greater 
impact on the PS-catalyzed folding rate than on binding the native 
or intermediate states, determined by measuring K x and K M values, 
respectively [53]. PS mutations Y26F, E30A, and Y26F/E30A 
resulted in essentially no change in or iT M while k l M was reduced 
by a factor of 10, 2, and 60, respectively [53]. These findings 
reflect the fact that the PS binds most tightly to the folding TS for 
aLP, and such interactions may involve non-native contacts, 
similarly as for pepsin. 

The PS may 'buffer' the folding landscape via non-native 
interactions 

The physiological basis for the existence of PS-catalyzed folding 
as a folding mechanism is not clear. For aLP it was shown that the 
PS aids in the formation of a kinetically trapped native state that is 
highly rigid, thus conferring enhanced resistance to proteolysis for 
this extracellular serine peptidase [5,54]. In contrast, pepsin was 
shown to have a relatively flexible native conformation [16], and it 
is reasonable to conclude that pepsin has evolved a different 
mechanism of resistance to proteolysis: Np is kinetically stable at 
acidic pH where most exogenous proteases would be inactivated, 
allowing for digestion by pepsin. Pepsin and OtLP are both 
kinetically trapped and are thermodynamically metastable or 
unstable, respectively, yet these features may not be related, given 
that many thermodynamically stable proteins are at least as, if not 
more, kinetically stable than pepsin or OtLP [5,55,56]. PS-catalyzed 
folding is not required to generate a kinetically stable fold. 

Given the above considerations, there must be a more universal 
role for PS-catalyzed folding. We hypothesize that PS-catalyzed 
folding allows for more destabilizing contacts to accumulate in the 
native fold, thereby allowing for a greater search of evolutionary 
space that would otherwise be restricted by the loss of stability (e.g., 
this could result in novel functions/substrate specificity). This is 
akin to the 'buffering capacity' that the chaperonins GroEL/ES 
were shown to provide in enhancing protein evolvability [57,58]. 
Indeed, aLP [8] and pepsin [7] are thermodynamically unstable/ 
metastable native states that would not exist without PS-catalyzed 
folding. In such a scenario, the PS acts to buff or smooth [59] the 
folding landscape via non-native interactions, catalyzing the 
folding to a thermodynamically stable PS-native state complex. 
Upon removal of the PS, these stabilizing contacts are no longer 
available in the unfolding TS; thus, the unfolding barrier is 
increased yielding a kinetically trapped native state. 

Materials and Methods 

Materials 

Synthetic peptides were obtained from CanPeptide Inc. (Pointe- 
Claire, QC, Canada), and were more than 95% pure as judged by 
LC-MS. Peptides corresponding to the 44-residue PS domain of 
pepsinogen were obtained in wild-type and the following single 
mutant forms, in which the wild-type residue was replaced with 
alanine: V4A, L6A, SI 1A, R13A, I17A, F25A, and K36A. Porcine 
pepsin A (EC 3.4.23.1) was purchased from Sigma (St. Louis, MO, 
USA) and used without further purification. Protein solutions were 
prepared by mass (wt/vol) and the concentrations were deter- 
mined from the absorbance at 280 nm, using extinction coeffi- 
cients of 1490 M 'cm 1 for the PS peptides and 
52,830 M~'cm _1 for pepsin, estimated using the ProtParam tool 



[60]. Rp samples were prepared by first denaturing pepsin by 
making a 20 mg/ ml solution in 30 mM NaOH, with a final pH of 
8, yielding alkaline denatured pepsin. Rp was then obtained by 
diluting an aliquot of the alkaline denatured protein to 0.35 mg/ 
ml in 20 mM acetic acid/NaOH buffer at pH 5.3. 

Sequence alignment 

Sequences similar to that of porcine pepsinogen were identified 
using an NCBI-BLASTp search [61], and the results were limited 
to the 100 top scoring, non-redundant sequences. Multiple 
sequence alignment of this group was then performed using 
CINEMA 5 [62], in which a breakpoint was added to isolate the 
alignment of the PS domain from that of the mature domain. 

PS-catalyzed folding of Rp to Np 

PS-catalyzed folding was carried out by combining 1 |jM Rp 
and 30 uM PS, in a volume of 100 ul, at pH 5.3 (20 mM acetic 
acid/NaOH with 100 mM NaCl) and 15°C. Aliquots were taken 
at several time intervals, diluted 20-fold in 50 mM phosphoric acid 
buffer, pH 1.2, incubated for 5 min at 25°C and assayed for Np 
activity using the KPAEFF(NO z )AL substrate. The recovery of Np 
activity with time, t, was fit with a monoexponential function (y = a 
- bexp(-kft)) to obtain the PS-catalyzed folding rate constant, kf. 
Under these conditions, PS binding to Rp reached equilibrium 
within the dead-time of mixing (<8 sec), as judged by Trp- 
fluorescence. 

PS binding to Rp 

The change in intrinsic tryptophan fluorescence of pepsin was 
used to measure the binding of PS to Rp and to determine the 
dissociation constant, K A . Rp solutions were diluted to between 0.6 
and 1.2 uM in 20 mM acetic acid/NaOH buffer, pH 5.3, and 
mixed with various amounts of PS. After incubating the samples at 
20°C for a 10 minutes, the intrinsic tryptophan fluorescence was 
measured using a PTI spectrofluorophotometer (Photon Technol- 
ogy International, Inc., Birmingham NJ, USA), with excitation at 
295 nm and emission measured at 315 nm. The change in 
fluorescence, AF i; at each PS concentration, [PS], was normalized 
relative to the maximum change, AF mial , and fit according to 



^[Sp] + [PS] +K d )-({- ([Rp] + [PS] + K d )f - 4[Rp] [PS])>/ 2 ^ / ^ ( 1 ) 

where [Rp] is the total concentration of pepsin. 

PS binding to Rp was measured in buffer without added 
1 00 mM NaCl in order to obtain more accurate measurements of 
K d , owing to a larger change in signal in the absence of salt. 
Additionally, without added 100 mM NaCl it was possible to 
isolate the PS-Rp binding step from the catalyzed folding step 
thereby improving the determination of Without 100 mM 
NaCl, no generation of protease activity was observed on a 
timescale of 0-2 hours (data not shown, and [7]), indicating that 
the PS binds Rp yet does not catalyze folding to Np. However, 
'H- 13 N TROSY NMR experiments [51] indicated that PS-Rp 
does fold to the native complex in the absence of 100 mM NaCl, 
but over longer timescales of days (Fig S7). 

PS binding to Np 

PS binding to Np was determined by using the PS as a 
competitive inhibitor and measuring the inhibition constant, 
Hydrolysis of the KPAEFF(N0 2 )AL substrate was measured by 
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following the decrease in absorbance at 300 nm using a Biochrom 
Ultrospec 3100pro UV-Vis spectrophotometer (Biochrom Ltd., 
Cambridge, England) in 20 mM acetic acid/NaOH buffer, 
pH 5.3, containing 100 mM NaCl. Np samples were diluted to 
10 nM and incubated with PS for 5 min at 20°C and assayed for 
activity. The reaction rates were normalized to the activity in the 
absence of PS and the data fit using the competitive inhibitor form 
of the Michaelis-Menten equation 

= F milx x [gj 
0 [Sl+KuQ + W/K,) { ' 

where v 0 is the initial reaction rate, F max is the maximum reaction 
rate, \S\ is the substrate concentration (fixed at 0. 1 mM), [I] is the 
inhibitor concentration, and K M is the Michaelis constant. 

CD spectroscopy 

CD data were collected using a Jasco J-810 spectropolarimeter 
(Jasco corp., Tokyo, Japan), over a wavelength range of 250 nm to 
190 nm with a 1 nm resolution, 100 nm/min scan rate, 0.25 s 
response time and four-fold accumulation of scans. PS wt was 
diluted to 0.1 mg/ml in either pure water or 20 mM acetic acid/ 
NaOH buffer at pH 5.3, with and without 100 mM NaCl, and 
loaded into a cell with a 0. 1 cm path length. Background spectra 
were subtracted and the sample spectra converted to units of mean 
residue ellipticity, MRE, using MRE = MRW x6 x /(10xdxc), 
where MRW is the mean residue weight (molecular weight/ 
number of residues), 8x is the measured ellipticity at a particular 
wavelength (degrees), d is the pathlength (0.1 cm) and c is the 
protein concentration (g/cm 3 ). 

Calculation of O-values from PS-catalyzed folding and 
binding constants 

The change in stability of the folding transition state upon 
mutation was calculated using 

AAGt=-RTlnff m / k j?j (3) 

where k[ wt and Af mut are the PS-catalyzed folding rate constants 
of the wild-type and mutant PS peptides. The change in 
equilibrium stability of PS-Np relative to PS-Rp upon mutation 
of the PS (AAGps(jvp-Rp)) was determined as the difference between 
the changes in binding energies, using 

AAGbind = - RT In (Kd'/K™ 1 ) (4) 

and 

A A Gps(N P - Rp) = AA Gps-Np — AAGps -Ftp (5) 

Here, AAG bmd refers to either AAGps-Rp, determined from K d 
values, or AAGps-jsp, determined from iT ; values. The <J>-value for 
each mutant corresponds to eq 3 divided by eq 5. Additional 
details are included in the supporting information section (Text 
SI). 

Supporting Information 

Figure SI PS-catalyzed folding approach to O-value 
analysis. The effect of a mutation on each step of the folding 
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landscape was determined separately by measuring PS-catalyzed 
folding and binding affinities rather than directly measuring the 
equilibrium stability of PS-Np relative to PS-Rp, AAG PS (x p _ R p). 
The relative changes in binding affinities gave AAGpggvp-gp), while 
A AG* was obtained direcdy from the relative folding rates. 
(TIF) 

Figure S2 Structure of the PS domains of pepsinogen 
(PDB: 3PSG) and progastricsin (PDB: 1HTR). Ribbon 
diagram showing select residues of pepsinogen (red side chains, 
black backbone) and progastricsin (blue side chains, grey 
backbone), starting from the N-terminus, pepsinogen numbering: 
V4, L6, R8, SI 1, R13, 117, F25 and K36. The overall fold is very 
similar with an average RMSD of 1.24 A, while particularly for 
the conserved residues L6 and R13 and the semi-conserved V4, 
Sll and K36, the structures are identical, with RMSD <1 A. 
(TIF) 

Figure S3 Sequence alignment of the PS domain of 
pepsinogen with the nearest 100 sequences. The first 
sequence from the top is porcine pepsinogen (PEPA_PIG). 
(TIF) 

Figure S4 Determination of kf, and /E- for PS- 

catalyzed folding, PS-Rp binding and PS-Np binding, 
respectively. Data and fit curves are the same as those shown in 
Fig 2BCD in the main text, and are plotted for each individual wt- 
and mutant PS for clarity. 
(TIF) 

Figure S5 Temperature dependence of PS^-catalyzed 
folding of Rp to Np. Rates were measured at 0, 5, 10 and 15°C, 
using recovery of Np activity. Data points are the mean ± SD of at 
least three determinations. The linear fit is also indicated. 
(TIF) 

Figure S6 CD spectra of prosegment. Far-UV CD spectra 
of PS wt , in ddH 2 0 and in 20 mM acetic acid/NaOH, at pH 5.3, 
with and without 100 mM NaCl added. A negative band at 
198 nm is characteristic of random coil structure [Sreerama N, 
Venyaminov SY, Woody RW (2000) Estimation of protein 
secondary structure from circular dichroism spectra: inclusion of 
denatured proteins with native proteins in the analysis. Anal 
Biochem 287: 243-251]. 
(TIF) 

Figure S7 PS-Rp folds to an identical native conforma- 
tion with and without added salt. 'H- 15 N TROSY NMR 
spectra were collected for samples of PS combined with Rp in 
buffers (A) containing 100 mM NaCl and (B) without added 
1 00 mM NaCl. (C) Overlay of the two spectra. The buffer was 
20 mM NaOAc pH 5.3 with 10% D 2 0, at 22°C. The NMR 
experimental details were published previously [51]. 
(TIF) 

Text SI Calculation of <I>-values from PS-catalyzed 
folding and binding constants. 

(DOCX) 

Text S2 Temperature dependence of PS-catalyzed fold- 
ing. 

(DOCX) 
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